153 research outputs found

    Robust Income Distribution Estimation with Missing Data

    Get PDF
    With income distributions it is common to encounter the problem of missing data. When a parametric model is fitted to the data, the problem can be overcome by specifying the marginal distribution of the observed data. With classical methods of estimation such as the maximum likelihood (ML) an estimator of the parameters can be obtained in a straightforward manner. Unfortunately, it is well known that ML estimators are not robust estimators in the presence of contaminated data. In this paper, we propose a robust alternative to the ML estimator with truncated data, namely one based on M-estimators that we call the EMM estimator. We present an extensive simulation study where the EMM estimator based on optimal B-robust estimators (OBRE) is compared to a more conservative approach based on marginal density (MD) for truncated data, and show that the difference lies in the way the weights associated to each observation are computed. Finally, we also compare the EMM estimator based on the OBRE with the classical ML estimator when the data are contaminated, and show that contrary to the former, the latter can be seriously biased.M-estimators, influence function, EM algorithm, truncated data.

    Robust inference with binary data

    Get PDF
    In this paper robustness properties of the maximum likelihood estimator (MLE) and several robust estimators for the logistic regression model when the responses are binary are analysed. It is found that the MLE and the classical Rao's score test can be misleading in the presence of model misspecification which in the context of logistic regression means either misclassification's errors in the responses, or extreme data points in the design space. A general framework for robust estimation and testing is presented and a robust estimator as well as a robust testing procedure are presented. It is shown that they are less influenced by model misspecifications than their classical counterparts. They are finally applied to the analysis of binary data from a study on breastfeedin

    Bounded-Bias Robust Estimation in Generalized Linear Latent Variable Models

    Get PDF
    This paper proposes a robust estimator for a general class of linear latent variable models (GLLVM) (Moustaki and Knott 2000, Bartholomew and Knott 1999). It is based on a weighted score function that is simple to implement numerically and is made consistent using the basic idea of indirect inference. The need of a robust estimator for these models is motivated by the study of the effect of model deviations such as data contamination on the maximum likelihood estimator (MLE). This is done with the use of the influence function (Hampel 1968, 1974) and the gross error sensitivity (Hampel, Ronchetti, Rousseeuw, and Stahel 1986). Simulation studies show that the MLE can be seriously biased by model deviations. The performance of the robust estimator in terms of bias and variance is compared to the MLE estimator with simulation studies and with a real example from a consumption survey.latent variable models, mixed items, influence function, robust estimation, indirect inference

    A Latent Variable Approach for the Construction of Continuous Health Indicators

    Get PDF
    In most health survey the state of health of individuals is measured through several different kinds of variables such as qualitative, discrete quantitative or dichotomic ones. From these variables, one aims at building univariate indices of health that summarize the information. To do so, we propose in this paper to use Generalized Linear Latent Variable Models (GLLVM) (see e.g. Bartholomew and Knott 1999), which allows to estimate one or more continuous latent variables from a set of observable ones. As an application, we consider the data from the 1997 Swiss Health Survey and build two health indicators. The first one describes the health status induced merely by the age of the subject, and the second one complements the first one.

    Zero-inflated truncated generalized Pareto distribution for the analysis of radio audience data

    Full text link
    Extreme value data with a high clump-at-zero occur in many domains. Moreover, it might happen that the observed data are either truncated below a given threshold and/or might not be reliable enough below that threshold because of the recording devices. These situations occur, in particular, with radio audience data measured using personal meters that record environmental noise every minute, that is then matched to one of the several radio programs. There are therefore genuine zeros for respondents not listening to the radio, but also zeros corresponding to real listeners for whom the match between the recorded noise and the radio program could not be achieved. Since radio audiences are important for radio broadcasters in order, for example, to determine advertisement price policies, possibly according to the type of audience at different time points, it is essential to be able to explain not only the probability of listening to a radio but also the average time spent listening to the radio by means of the characteristics of the listeners. In this paper we propose a generalized linear model for zero-inflated truncated Pareto distribution (ZITPo) that we use to fit audience radio data. Because it is based on the generalized Pareto distribution, the ZITPo model has nice properties such as model invariance to the choice of the threshold and from which a natural residual measure can be derived to assess the model fit to the data. From a general formulation of the most popular models for zero-inflated data, we derive our model by considering successively the truncated case, the generalized Pareto distribution and then the inclusion of covariates to explain the nonzero proportion of listeners and their average listening time. By means of simulations, we study the performance of the maximum likelihood estimator (and derived inference) and use the model to fully analyze the audience data of a radio station in a certain area of Switzerland.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS358 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Robust estimation of personal income distribution models

    Get PDF
    Statistical problems in modelling personal income distributions include estimation procedures, testing and model choice. Typically, the parameters of a given model are estimated by classical procedures such as maximum likelihood and least squares estimators. Unfortunately, the classical methods are very sensitive to model derivations such as gross errors in the data, grouping effects or model misspecifications. These deviations can ruin the values of the estimators and inequality measures and can produce false information about the distribution of the personal income in a given country. In this paper we discuss the use of robust techniques for the estimation of income distributions. These methods behave as the classical procedures at the model but are less influenced by model deviations and can be applied to general estimation problems.Personal income distribution, inequality measures, parametric models, influence function, M-estimator.

    Distributional Dominance with Dirty Data

    Get PDF
    Distributional dominance criteria are commonly applied to draw welfare inferences about comparisons, but conclusions drawn from empirical implementations of dominance criteria may be influenced by data contamination. We examine a non-parametric approach to refining Lorenz-type comparisons and apply the technique to two important examples from the LIS data-base.Distributional dominance, Lorenz curve, robustness.
    corecore